-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kernel] Enhance buildimages.sh and emulator scripts, add DMASEGEND in config.h #2091
Conversation
…esting Add dosbox.sh for PC-98 image testing Fix qemu.sh for macOS Enhance emu86.sh for use after buildimages.sh
After more careful code inspection, the next step of splitting DMASEG into two parts IMO should be considered more carefully. While there are potentially good future reasons to have a separate always-available protected single-block DMASEG, the current BIOS driver already handles HD I/O separately directly into the requested buffer without disturbing the track cache, unless XMS is on. When XMS is on, the FD I/O has the same issue of requiring DMASEG, not using the track cache but invalidating the track cache. Separating DMA and track cache buffers would give an advantage to combined HD/FD copy operations, but only when XMS is off. I'm not sure what real advantage that has, at an extra dedicated 1K low memory cost. Also, should either BIOS driver start using DMASEG separately for XMS and not invalidate the cache, this would require an additional check and fmemcpy to update the cache when also writing the block. The DF driver already does this. A potential issue is that for older (slow, non-386) machines, this extra memcpy is quite slow and there's no guarantee the updated track cache contents will ever actually be used. All this possibly feeds into the smaller 6K cache size found by @Mellvik to be most optimal (testing on DF driver only). In any case, separating buffers without more cache and XMS buffer analysis is currently deemed both complicated and risking introducing subtle bugs, for which we don't have sufficient regression testing. Thus for the time being, this PR cleans up config.h considerably which is great, but will likely stop there with regards to multiple I/O buffers. Now the next step will be adding the ability to dynamically set cache size on DF and BIOS, with the ability to turn the cache off on fast systems, for optimal speed seen on real hardware. |
@ghaerr, thank you for some really useful 'loud thinking' about this issue. This is as important on TLVC as it is on ELKS, although possibly for slightly different reasons: We have more DMA devices while BIOS IO is a low priority 'add on', with ELKS it's the other way around. Anyway, here are some considerations that come to mind ('thinking loud' back, and generally ignoring BIOS IO):
I don't agree, first because this is handled by the Chances are there are situations I haven't accounted for when thinking about this. When trying to summarize, I ended up with this list of specific requirements and the platform they belong to:
Most if not all these requirements can be implemented via menuconfig, even the latter two (which never occur concurrently), allocated below the floppy cache. The most interesting (and worst case) scenario would be a 286 AT, which may have XMS (rare, like the Compaq) and floppy cache and Lance ethernet. In this case the Lance DMA bounce would be the first 1k in DMASEG and would double as its XMS bouncer. Next in DMASEG would be a combined floppy DMA/XMS bounce buffer, followed by the fdcache, which in this case would not have to double as DMA bouncer. Total DMASEG 8k. A generic catch-all/cover-all configuration would be to set aside 6k DMASEG, plus 1k if either LANCE or XD is configured. (BTW - implicit assumption: I think it's possible, but I'm not planning to make the LANCE driver available for 8bit ISA). This does indeed sound complicated, but in terms of code it's quite simple and most of the stuff is already in place. My $0.02 (or maybe 0.04...) |
Great comments @Mellvik, agreed on everything you're saying, and only realized after writing the post that XMS isn't applicable to PC/XT systems (nor 286 systems, since the LOADALL method of setting the shadow GDT registers isn't implemented). I'm still convinced it's a complicated decision on what changes to make going forward, and am continuing to try to arrange the matrix before starting changes. Thinking more and especially after reading your post, I had the following additional thoughts/problems to consider:
Is the XD driver asynchronous? Lance is synchronous, right? A final important point is that for 386+ systems with cache turned off, XMS should always be ON. Almost every 386+ system will have XMS memory available, and it doesn't make much sense to try to improve I/O speeds for 386 systems when with XMS one can have up to 2500 buffers (2.5M, almost twice a full floppy) of data cached. For this reason I'm considering rethinking the decision to have XMS turned off in the default shipping configuration. I can't recall but think we might have had a couple early XMS issues with some 386 systems with regards to A20 line management.
It's still complicated! I can't quite get my hands around how important dedicated low memory is for compiled-in drivers. We could switch to using seg_alloc and allocate everything out of main memory w/ALIGN1K, now that I think of it. If the driver were opened at or very near boot, this might pack all the DMASEG and track caches together, with no fragmentation. Perhaps I should test that concept.
[EDIT: The track cache, being larger than 1K, can't be allocated w/ALIGN1K to prevent address wrap, so the floppy driver(s) can't use seg_alloc. But could be better idea for dedicated buffers for other drivers requiring only 1K buffers.] Thank you! |
Thanks @ghaerr, this is useful. I'll ponder you comments overnight :-), more comments tomorrow. For now just a couple of things.
Actually, and as mentioned in my post, the 286 does implement loadall and works fine with xms - vendor permitting. I have no clue about the magic behind this (you implemented it), but XMS buffers work fine on the compaq 286 portable III - 2500 buffers and all.
All interrupt driven drivers are asynchronous. That includes the network drivers in ELKS. Lance doesn't work yet, but if possible, it's even more async than the other network drivers because it's using dma. All the others are PIO. |
That's undoubtably because XMS on those machines is configured and implemented via the extended INT 15 protected block move (not unreal mode), which happens to be required on all Compaq BIOSes, because the BIOS itself runs in protected mode, which interferes with unreal mode. I'm aware of LOADALL, it is required on 286 (and still would not work on Compaq), my point is that ELKS doesn't implement it. ELKS only implements XMS via unreal mode or INT 15.
No - they don't all accept requests "asynchronously at interrupt time", which was my definition of "asynchronous". Divided into two classifications - drivers using interrupts (many) and drivers that accept I/O requests from an interrupt handler (DF & SSD) - none of the network drivers do the latter. The entire TCP/IP network subsystem is synchronous, and has to wait for ktcp to process each request before starting another (ktcp technically hangs in select so it can process another "request", but only gets one application request at a time, and there's only one NIC open at a time, and only one /etc/tcpdev file descriptor). Also, none of the network drivers have a request queue. An async driver can't return a direct result code to its caller, unless the caller is an async subsystem itself (only the block I/O subsystem is). Given that explanation, I'm assuming the TLVC XD driver is async, DF of course is, and the TLVC HD driver is synchronous, along with ELKS BIOS FD and HD. The Lance driver puts the calling process to sleep until DMA is complete, not able to accept another request in the meantime, right? So its synchronous also. All NIC drivers receive requests through the character filesystem sub driver read/write entry point, which require a result to be returned to the caller synchronously. So what I'm talking about is the ability for asynchronous I/O requests to be received (e.g. getting the next actual I/O request at interrupt time), or not - because as I mentioned above, "asynchronous" drivers can't share a buffer with anything else, since they can't put the calling process to sleep if not available. We can't reliably share DMA segments between async and sync drivers, but sync drivers could share a DMA segment, providing their read/write entry points won't ever be called/in-use simultaneously. I believe this means sync disk drivers could share DMA, but only amongst themselves, but only if the driver doesn't sleep a process. Since the NIC drivers all sleep the calling process (which is always ktcp) they can likely share a DMA segment until we implement multiple NIC cards in use at the same time. (Is that called multi-homing?) Complicated and messy. Still trying to figure whether statically allocate, share, or dynamically allocate DMA buffers given the issues above. |
OK, I remember the INT15 trick - doing the job of switching from real mode to virtual mode (possibly unreal mode?) and back. Very useful. My point was that XMS is indeed available on (many) 286 systems, which is relevant for this discussion: Which system groups must have XMS bounce buffers.
Actually I don't understand your definition and I don't think this is the optimal venue for that discussion. But I'm sure we can agree that the most fundamental prerequisite for asynchronicity in this context is the availability of buffers. The storage IO system achieves asynchronicity by passing buffers back and forth, allocating, releasing, queuing, waiting, sending wakeups etc. In most OSes there is a similar system for network requests, which TLVC and ELKS doesn't have and cannot reasonably afford. So By introducing buffers at the driver level, the network subsystem becomes completely asynchronous even though While (possibly) interesting in itself, the asynchronicity and drivers issue seems completely irrelevant to this discussion (about how the use the precious resource of low memory RAM). What I suggested in my previous post is to not share DMA segments, since there is no reason to:
There are conveniently compile time definitions/configurations, very simple. More about the DMASEG allocation in the next post. Thank you. |
I agreee, this is in line with my suggestion. And - If I'm thinking clearly - if there is an XMS bounce buffer, there is no need for a DMA bounce.
It is. And this calculation is already in place in TLVC, using the menuconfig'ured
For floppy this may be the case, thinking that cases where floppies are ROOT devices or continuously mounted will be rare. I was thinking about adding such a flag to
Agreed.
I'm assuming 'not in use' means 'not configured'. And I do see this complication (although well hidden in a config.in file...). Another case for either/or: I did allow mixing direct and BIOS drivers for a while in TLVC, then ditched it for exactly this reason: DMASEG sharing became too complicated (or dangerous if not covering all the bases).
I agree and - although I have not used XMS for a very long time - I think it should be the default (the additional code is likely minimal). When I did use it, the use was extensive - on 286 and 386, and I do believe it's safe. That said, I came to use no more than 100 XMS buffers because of the syncing - blowing away the IO subsystem for elongated periods. A different discussion - again.
It's an interesting idea, but keep in mind that whatever is done/chosen along the alternatives discussed here, is still less than the FloppyTrackBuffer we've had since forever. Thank you. |
Agreed, and my choice of using the word "asynchronous" to describe both the workings of the method(s) of requesting I/O as well as the inner workings of various drivers falls short. As well as going off on a tangent about it all. Useful discussion though as I'd forgot about some earlier conversations about how adding a buffer pool to the NIC drivers might increase throughput. Which begs the question of even more buffers and where to put them... We can leave that to another day. Thanks for all your other comments about DMA, XMS and XT vs 386. (And I think 286 will fall where it may after your continued testing - either treated like XT or 386 for purposes of caching and associated buffer management). I'll try summarizing (again) what might work for ELKS and TLVC, as I'm still interested in a common driver interface between the systems.
That's it. The separation of a DMASEG from the floppy driver(s) entirely (since they'll use their own track cache) costs 1K bytes, but keeps things simple by not having the floppy driver(s) get too far into the way of the other driver's buffer requirements, and allows for the possibility of configuration a single extra shared 1K buffer for DMA or XMS.
Have you seen the mechanism used in the ELKS DF driver for configuring its DMA and track cache buffers? This compile-time configurable method allows for using a separate or combined DMA buffer in or outside the track cache (for example purposes):
The segment and offset are defined separately for cache and bounce buffers, allowing any memory area to be segmented or overlapped for use as cache or DMA buffer. Later in the driver, the track cache is invalidated only if the track cache is shared with the DMA buffer:
This is just for example as a possible means of allowing a drivers buffers to be fully configurable outside the normal configuration mechanism.
Nice idea! I'll use it instead of trying to enhance heap_alloc, as the fragmentation result would likely be equal, with the exception of a possible "first-fit" rather than "best-fit" initial placement, as is used now.
Agreed. The TLVC configuration will remain a bit more complicated depending on the final choice made for the driver's buffer allocations, but should all still be quite workable. I'm thinking ultimately what we'll really want is a Configuration Guide, or at least some documentation of what options are available on a per-driver basis, if this ever comes up.
Yes - although probably the other way around: if a DMA buffer is required, it can always also be used for XMS (but not vice versa). Our agreed point is that in all cases, the current drivers use the same buffer for DMA and XMS, I think.
Yes, except that for ELKS I think a very useful use case will be using the DF driver as standard simultaneously while allowing the use of the BIOS driver for HD access. This is especially important since the TLVC HD driver(s) are under heavy development and aren't really ready for release yet. By allowing DF and BIOS HD along with a separate DMASEG for BIOS HD, the next ELKS release could move to DF with continued HD reliability. There is some question as to whether the BIOS itself requires a DMA-wrap buffer when requesting HD I/O, but I'm thinking its not worth taking any chances as to whether PIO is always used internally, as the BIOS could be doing anything.
Geez - that's a serious problem. I've made a note of it. I'm thinking something along the lines of writing a What do you think, do the above bullet points cover what we need for the next step in separating DMASEG/CACHE buffers into two, or is there more that needs to be added? |
Sounds good! Agreed.
Pretty neat, I didn't think about that one. That said, I keep thinking that the rarity of a DMA wrap would make it perfectly reasonable to always share the cache and bounce buffer for that purpose and just invalidate the cache and go on in the rare cases when wrap happens. I may have taken it to the extreme in terms of simplicity (and there is no provision for BIOS HD DSMA wrap here), but this is the setup I'm testing right now:
XD disk and LANCE share the same bounce buffer since they never occur concurrently. Floppy is always there and at least a 1k bounce/wrap buffer. The rest are using
Yes, I think that's necessary - even for ourselves. it's har to remember all the variants and why this was chosen instead of that - etc.
Yes - that's my perception too :-)
The IDE driver has been stable for a long time, but you're still right in the sense that the XT-IDE stuff has been added recently. Next step - interrupts - is a big change too, and I'm considering leaving the current driver in place ant let them live alongside each other while testing. Makes life easier in many ways.
There is always the XD (MFM drive) case even with BIOS IO, but if you give the HD driver a DMA safe bounce buffer, it would act as a DMA bouncer for XT class system (no XMS there), and an XMS bouncer for AT+. The 1k in low men is cheap for the convenience and the memory has to come from somewhere anyway...
|
Agreed that DMA and XMS can should always be able to be the same buffer. What the above code does is allow the DF driver to separate it's own DMA/XMS buffer out of the track cache or not - currently the DF driver DMA/XMS buffer is always within its track cache. I plan on doing the same sort of thing in the BIOS FD/HD driver, which will allow either driver to be compiled to include or not include its DMA/XMS buffer within its track cache, to make future configuration(s) easy while we're in the depths of this. The important part though, after careful inspection of the BIOS FD/HD code - is that ELKS can with a few changes be setup to function with the the BIOS, DF or both drivers (permanently allowing either driver to be configured at any time). The enhancement would be configuring the BIOS driver to use a separate DMA/XMS buffer from its track cache and configure the DF driver to have its DMA/XMS buffer within its track cache. Since the DF driver is "async" (explained in long detail above), it can't share a DMA/XMS buffer with any other driver. But since the BIOS driver is "sync" and BIOS FD will never be used simultaneously with DF, having a separate DMA/XMS buffer for the BIOS driver allows the BIOS HD driver to co-exist with DF and work great. This also solves the ability to access old XD drives as you point out, without having to port the XD driver over. The BIOS DMA/XMS buffer will also be available to be shared with other "sync" drivers. I'm putting together a PR, you'll see the details.
Yes, and this will also work for the BIOS HD driver as described above. I'm going to keep the name of the shareable/sync etc DMA/XMS buffer DMASEG (as you have) which was its original purpose, and rename all other uses to something else (TRACKSEG etc) similar to what you might be doing.
Yes, I was thinking similarly, but hope to avoid the
There are more issues to be resolved for sharing drivers between TLVC and ELKS. Happy to talk more about them when ready, I would suggest we try getting the NIC drivers shared first, as less complications and the API is possibly the same. |
Oops; i forgot to add in that one - thanks for the reminder.
That's a good plan. For TLVC all drivers are considered async (IDE driver soon), and need their own bounce buffer. That may sound like a lot but isn't. The FD driver gets its own, the XD or Lance get one and that's it. The non-DMA drivers use |
IMO, the ability for continued experimentation is a good thing, especially up to at least a full track limit on 1440k drives. Why artificially limit what a user wants to configure, unless it breaks the system? |
Many reasons, actually - the most important that rigorous testing has shown a larger cache is not meaningful. Then there are
A KB here and a KB there. BTW the upcoming directfd driver update introduces conditionals around the cache code so it will automatically be removed if CONFIG_FLOPPY_CACHE is 0. Not a KB, maybe 150 bytes ... |
Sorry, I misunderstood your statement. Agreed the floppy cache should be set by default to the optimum results from your testing. I plan on following suit after making small updates to the DF driver. The BIOS driver is a bit more complicated and may be better left as-is, since its cache design is less amenable to a fixed-size cache starting at particular sector number with MT track wrap. Current test results show 6k cache give best results, or is it a different number?
Totally agree! |
It's safe to say that 6 is best. 7 is marginally better for some combinations but the margin is below noticeable. I've chosen 6 as the default in menuconfig, above 7 will become 7 in config.h. CONFIGuring out the floppy cache turned out to save ~400 code bytes, 32 data bytes :-) |
This work is a precursor to splitting DMASEG/DMASEGSZ into two buffers; a bounce buffer for XMS or 64k-wrap I/O, and a floppy disk track cache buffer for reading contiguous sectors used in the BIOS and DF drivers.
Currently the shared use of this area for a bounce buffer causes the track cache to be invalidated since both use the same DMASEG start address in low memory. In addition, the BIOS HD driver has to invalidate the floppy track cache since it also performs sector I/O using DMASEG. By splitting off a separate DMASEG from CACHESEG, neither the BIOS HD driver nor BIOS FD or direct DF driver will have to invalidate the cache for XMS or HD I/O.
While this will cost 1K of low memory, testing and enhancements being performed by @Mellvik in Mellvik/TLVC#88 is showing that best floppy cache performance is obtained by disabling the cache entirely for 386+ (fast) computers and using an approx 6K cache for (other, slow) PC/XT systems. This saves 3K from the current max track cache of 9K (including a shared bounce buffer). In the case of a disabled cache, 6K could be released to the kernel for general memory use.
The DMASEG buffer could be shared across the BIOS FD and DF drivers, since the system won't work with both simultaneously. As mentioned above, the BIOS HD and FD driver can share DMASEG since the I/O is always synchronous. However, sharing DMASEG between BIOS HD and direct DF won't work and is currently a major bug, as the DF driver can receive async requests during BIOS HD I/O that require the simultaneous use of DMASEG when XMS is in use, or when I/O is requested into a non-wrap-protected 1K L1 buffer.
Without XMS, there shouldn't be contention since the DF driver won't receive any I/O requests with 64k address wrapping since all L1 and L2 non-XMS buffers are wrap-protected.[EDIT: Actually only L2 buffers are wrap protected, as they're allocated using seg_alloc's SEG_FLAG_ALIGN1K. L1 buffers are allocated via heap_alloc which doesn't yet have the ability to align an allocation on a block boundary.] Should the DF be enhanced to handle raw device requests, address wrap could occur unless the upper level splits the I/O into multiple segments (preferred).
When HD and DF compete, the solution is probably a kernel mutex around DMASEG; that will be addressed after these next enhancements.
The kernel code segment start address REL_SYSSEG is now calculated in this PR, potentially causing trouble without testing ROM and PC98 versions heavily. In order to more quickly test this on all systems (IBM PC, PC-98, 8088 ROM and 8018X ROM) the buildimages.sh script has been enhanced to allow for rapid compilation of kernels only using
./buildimages.sh fast
. Testing of that showed deficiencies in qemu.sh and emu86.sh (for ROM desktop emulation) which are updated, and dosbox.sh is added for PC-98 testing. A new file pc98-1232-nc.config is added to produce a non-compressed image for speed.